57 research outputs found

    Identifying elemental genomic track types and representing them uniformly

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>With the recent advances and availability of various high-throughput sequencing technologies, data on many molecular aspects, such as gene regulation, chromatin dynamics, and the three-dimensional organization of DNA, are rapidly being generated in an increasing number of laboratories. The variation in biological context, and the increasingly dispersed mode of data generation, imply a need for precise, interoperable and flexible representations of genomic features through formats that are easy to parse. A host of alternative formats are currently available and in use, complicating analysis and tool development. The issue of whether and how the multitude of formats reflects varying underlying characteristics of data has to our knowledge not previously been systematically treated.</p> <p>Results</p> <p>We here identify intrinsic distinctions between genomic features, and argue that the distinctions imply that a certain variation in the representation of features as genomic tracks is warranted. Four core informational properties of tracks are discussed: gaps, lengths, values and interconnections. From this we delineate fifteen generic track types. Based on the track type distinctions, we characterize major existing representational formats and find that the track types are not adequately supported by any single format. We also find, in contrast to the XML formats, that none of the existing tabular formats are conveniently extendable to support all track types. We thus propose two unified formats for track data, an improved XML format, BioXSD 1.1, and a new tabular format, GTrack 1.0.</p> <p>Conclusions</p> <p>The defined track types are shown to capture relevant distinctions between genomic annotation tracks, resulting in varying representational needs and analysis possibilities. The proposed formats, GTrack 1.0 and BioXSD 1.1, cater to the identified track distinctions and emphasize preciseness, flexibility and parsing convenience.</p

    The Genomic HyperBrowser: inferential genomics at the sequence level

    Get PDF
    The immense increase in the generation of genomic scale data poses an unmet analytical challenge, due to a lack of established methodology with the required flexibility and power. We propose a first principled approach to statistical analysis of sequence-level genomic information. We provide a growing collection of generic biological investigations that query pairwise relations between tracks, represented as mathematical objects, along the genome. The Genomic HyperBrowser implements the approach and is available at http://hyperbrowser.uio.no

    Prediction of long-term remission in patients following discontinuation of anti-TNF therapy in ulcerative colitis: a 10 year follow up study

    Get PDF
    Background - The long-term outcomes of Ulcerative colitis (UC) after discontinuation of biological therapy are largely unknown. There is also a lack of accurate and validated markers that can predict outcome after withdrawal accurately. The aims of this study were to describe the long-term outcomes in UC patients following cessation of anti-TNF therapy and explore potential biomarkers as an approach towards precision medicine. Methods - Seventy-five patients with moderate to severe UC treated to remission with anti-tumor necrosis factor (TNF) were included in the study. This is a follow-up of previously reported UC outcomes. The patients were categorized as either “Remission” or “Relapse”. The “Relapse” group was divided into subgroups determined by the highest treatment level needed to obtain remission the last 3 years of observation: non-biological therapy, biological therapy or colectomy. Remission were divided in long term remission (LTR), those using immunomodulating drugs (LTR + imids) and those using only 5-amino-salicylate (5-ASA) treatment (LTR) for the past 3 years. Analyses of mucosal gene expression by real-time PCR were performed. Results - The median (IQR) observation time of all patients included was 121 (111–137) months. Of the 75 patients, 46 (61%) did not receive biological therapy, including 23 (31%) in LTR ± imids. Of these 23 patients, 16 (21%) were defined as LTR with a median observation time of (IQR) 95 (77–113) months. In total 14 patients (19%) underwent colectomy during the 10 years after first remission. Mucosal TNF copies/µg mRNA  Conclusion - In this 10-year follow-up of UC of patients with moderate to severe disease, 61% of patients experience an altered phenotype to a milder disease course without need of biological therapy. Twenty-one percent of the patients were LTR without any medication except of 5-ASA. Mucosal TNF gene expression and IL1RL1- transcripts may be of clinical utility for long term prognosis in development of precision medicine in UC

    The Genomic HyperBrowser: an analysis web server for genome-scale data

    Get PDF
    The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome

    The Genomic HyperBrowser: an analysis web server for genome-scale data

    Get PDF
    The immense increase in availability of genomic scale datasets, such as those provided by the ENCODE and Roadmap Epigenomics projects, presents unprecedented opportunities for individual researchers to pose novel falsifiable biological questions. With this opportunity, however, researchers are faced with the challenge of how to best analyze and interpret their genome-scale datasets. A powerful way of representing genome-scale data is as feature-specific coordinates relative to reference genome assemblies, i.e. as genomic tracks. The Genomic HyperBrowser (http://hyperbrowser.uio.no) is an open-ended web server for the analysis of genomic track data. Through the provision of several highly customizable components for processing and statistical analysis of genomic tracks, the HyperBrowser opens for a range of genomic investigations, related to, e.g., gene regulation, disease association or epigenetic modifications of the genome.publishedVersio

    GSuite HyperBrowser: integrative analysis of dataset collections across the genome and epigenome

    Get PDF
    Background: Recent large-scale undertakings such as ENCODE and Roadmap Epigenomics have generated experimental data mapped to the human reference genome (as genomic tracks) representing a variety of functional elements across a large number of cell types. Despite the high potential value of these publicly available data for a broad variety of investigations, little attention has been given to the analytical methodology necessary for their widespread utilisation. Findings: We here present a first principled treatment of the analysis of collections of genomic tracks. We have developed novel computational and statistical methodology to permit comparative and confirmatory analyses across multiple and disparate data sources. We delineate a set of generic questions that are useful across a broad range of investigations and discuss the implications of choosing different statistical measures and null models. Examples include contrasting analyses across different tissues or diseases. The methodology has been implemented in a comprehensive open-source software system, the GSuite HyperBrowser. To make the functionality accessible to biologists, and to facilitate reproducible analysis, we have also developed a web-based interface providing an expertly guided and customizable way of utilizing the methodology. With this system, many novel biological questions can flexibly be posed and rapidly answered. Conclusions: Through a combination of streamlined data acquisition, interoperable representation of dataset collections, and customizable statistical analysis with guided setup and interpretation, the GSuite HyperBrowser represents a first comprehensive solution for integrative analysis of track collections across the genome and epigenome. The software is available at: https://hyperbrowser.uio.no.This work was supported by the Research Council of Norway (under grant agreements 221580, 218241, and 231217/F20), by the Norwegian Cancer Society (under grant agreements 71220’PR-2006-0433 and 3485238-2013), and by the South-Eastern Norway Regional Health Authority (under grant agreement 2014041).Peer Reviewe

    Representation and integrated analysis of heterogeneous genomic datasets

    No full text
    The technological developments in molecular biology over the last 50 years have brought with them a gradual shift of focus from genes and proteins towards the biological activity in non-coding parts of the genome. High-throughput sequencing techniques have opened a floodgate of published whole-genome datasets of experimental nature, such as ChIP-seq or variation data, which has accelerated this development. The shift towards non-coding regions of DNA has increased the need for representing data as genomic tracks, i.e. with coordinates along a reference genome. Consequently, a demand for user-friendly tools for analyzing such tracks has arisen. This thesis presents a conceptual differentiation of genomic tracks into fifteen track types, such as points or segments, and it argues that the track types of study determine which questions are meaningful to ask. Furthermore, the thesis presents “The Genomic HyperBrowser” (http://hyperbrowser.uio.no), a general web-based system for statistical analysis of genomic tracks. The system incorporates a range of hypothesis tests for answering questions about particular relations between datasets. The calculation of p-values is mostly based upon Monte Carlo simulation under a user-selected null model. In addition, a number of descriptive statistics and data manipulation tools have been developed. The thesis also introduces “GTrack”, a new file format for most types of genomic data, supporting all of the fifteen track types. The file format, and its binary variant, is fully supported by the Genomic HyperBrowser, providing the backbone for flexible high-speed analysis within the system. Lastly, the thesis presents “The differential disease regulome”, a hypothesis-generating tool providing a powerful way to visualize relations between two classes of datasets. In the main case, transcription factors are related to disease genes. The resulting heatmap of ~500.000 relations is browsable via the Google maps engine, due to its large size
    corecore